Estimating the COVID-19 infection fatality ratio accounting for seroreversion using statistical modelling

Background The infection fatality ratio (IFR) is a key statistic for estimating the burden of coronavirus disease 2019 (COVID-19) and has been continuously debated throughout the COVID-19 pandemic. The age-specific IFR can be quantified using antibody surveys to estimate total infections, but requires consideration of delay-distributions from time from infection to seroconversion, time to death, and time to seroreversion (i.e. antibody waning) alongside serologic test sensitivity and specificity. Previous IFR estimates have not fully propagated uncertainty or accounted for these potential biases, particularly seroreversion. Methods We built a Bayesian statistical model that incorporates these factors and applied this model to simulated data and 10 serologic studies from different countries. Results We demonstrate that seroreversion becomes a crucial factor as time accrues but is less important during first-wave, short-term dynamics. We additionally show that disaggregating surveys by regions with higher versus lower disease burden can inform serologic test specificity estimates. The overall IFR in each setting was estimated at 0.49–2.53%. Conclusion We developed a robust statistical framework to account for full uncertainties in the parameters determining IFR. We provide code for others to apply these methods to further datasets and future epidemics.


Supplementary Methods
Symptom onset to seroreversion parameter estimation We estimated the time to seroreversion using an extended dataset from the previously published longitudinal Abbott SARS-CoV-2 IgG assay data 1 as described in the main text. Of the 109 convalescent participants analyzed, eight did not seroconvert during follow-up (i.e. they were negative at baseline and remained so during the follow-up period). Among the final 101 participants, 72 were female and the median age was 47 years (range: 21 -65 years). Seronegative status (i.e. the outcome) was defined as titres below an optical density of 1.4. 1 From the Weibull survival model estimated using the `survival` R-package, 2,3 we found that the Weibull shape and scale parameters were 2.32 and 215.5., respectively. In order to conform with the mathematical derivation of the model (Equation 16 above), we subtracted 13.3 (days) from the Weibull scale parameter to account for the time of symptom onset to seroconversion (as an approximation). 4 Onset-outcome delay distributions and serological test characteristics Prior distributions are given in Supplementary Table 1. The delay from onset of symptoms to death was assumed to follow a gamma distribution with a mean of 14.8 days and coefficient of variation of 0.85. 5 The delay from onset of symptoms to seroconversion was assumed to follow an exponential distribution with a mean time of seroconversion of 13.3 days. 6 For both distributions, the incubation period was assumed to be fixed at five days, 7,8 thereby providing overall means of onset of infection to death and seroconversion of 19.8 and 18.3 days, respectively. Priors for the onset-delay means were truncated Gaussian distributions while the onset-to-death coefficient of variation had a Beta distributed prior (Supplementary Table 1). When seroreversion was considered, the prior for the rate of seroreversion after seroconversion was assumed to have a truncated Gaussian distribution.
The prior distributions for the age-specific IFR estimates were assumed to be uniform with minimum and maximum values of 0% and 40%, respectively. The 40% maximum was selected as not more than three-times the upper credible interval of three early IFR studies. [9][10][11] Spline curves were fit with five knots to estimate the incidence of infections over time, with uniform priors on the x-and y-axis positions. The minimum and maximum values of the y-axis positions were zero and the total population size, respectively, while the maximum x-axis position was allowed to range from fourteen days prior to the last observed death date to the last observed death date.
To estimate the proportion of infections occurring in each age group, we assumed that the attack rate was proportional to the population fraction within that age group multiplied by an additional scaling parameter assumed to follow a truncated Gaussian distribution. The modelled seroprevalence by age was fit to the age specific seroprevalence data. Given that serostudies were conducted over a given timeframe (e.g. two-weeks), we took the average of the model seroprevalences over the serostudy time-period to propagate uncertainty in collection times. These modelled prevalences were then compared with the reported seroprevalence.
The prior distributions for the sensitivity and specificity were assumed to follow beta distributions based on the number of true positives that were correctly identified, the number of true negatives correctly identified, and the total number of samples tested. To allow for some uncertainty --even among studies that found 100% specificity during serovalidation --we added 0.5 to our Beta scale and shape parameters (the Jeffreys prior). For the subset of studies where regional seroprevalence information was available (6/10), we used the posterior estimates of sensitivity and specificity from the region-based models (see below) to re-inform the priors for the IFR model. The new beta distribution shape parameters were calculated from the regionalbased model sensitivity and specificity posteriors using a method of moments approach with `fitdistplusr` R-package 12 (Supplementary table 5).
During model fitting, 10,000 burnin-in and 10,000 sampling iterations with 50 rungs across 10 chains were considered. The thermodynamic power between rungs was set to 3.0 to ensure high swap rates among the hotter chains for all models except: Brazil, England, Spain, Italy, New York State no-seroreversion and seroreversion models and Brazil, England, Italy, Sweden and New York State care-homes excluded models where the thermodynamic power was set to 2.0.
Obtaining specificity priors from regional data A simplified version of the main statistical model described above was used to estimate serological test specificity in studies where data were available by region within the larger serosurvey. As before, we incorporated test sensitivity and specificity into this model using the reported validation data from each study as priors, and we then estimated regional and agespecific seroprevalence and deaths. Specificity and sensitivity were assumed constant across regions and ages. The age-specific IFRs were assumed to be the same in each region, but incorporated region-specific demography so that the overall regional IFRs could vary (e.g. London, England has a much younger population than the rest of the country, and therefore a lower overall IFR). The delays from onset-death and onset-seroconversion were removed for analytical tractability, and we used cumulative mortality data up to the midpoint of the serosurvey, making the assumption that most deaths and seroconversion had occurred by the time of the serosurvey. This assumption is reasonable for serosurveys conducted some weeks after the first wave of the epidemic. We did not have seroprevalence data broken down by both age and region at the same time, but we fit the model to both the marginal age and regional seroprevalence, assuming the same risk ratios of being seropositive by age in all regions. We did not include seroreversion in this analysis. The model was fit with RStan using 4 chains, each having 10,000 burn-in and 10,000 sampling iterations 13 . The underlying code is available on Github (mrc-ide/reestimate_covidIFR_analysis/analyses/Rgn_Mod_Stan). Code is also available in the same folder showing simulation of an epidemic with different regional attack rates, in order to test model performance ( Figures 4B & 4C Similarly, the spline knots were reparameterized relative to the last observed day, where E was the last observed day and E-14 was fourteen days prior. The first knot was always fixed at one to allow for a common time origin. The spline y-positions were similarly set relative to the third y-position, which was allowed to range between zero and the approximate size of the population (popN).

Supplementary Note 1. Comparison of the number of serosurveys
Supplementary Figure 1 -Estimating IFR from one versus two serosurveys. Data was simulated for three age groups (ma1, ma2, ma3) with IFRs of 0.1%, 1%, and 10%, respectively. Follow-up time was assumed to be 200 days. The simulations differed in the number of simulated serosureys with (A) consisting of a single serosurvey ranging from days 145-155 versus (B) consisting of two serosurveys: days 120-130 and 170-180, respectively. In both simulations, we assume that 0.1% of the population was sampled and that the simulated serologic test had a sensitivity and specificity of 0.85% and 0.95%, respectively. Although the statistical model can capture the true simulated IFR values in the 95% credible intervals (grey) in both scenarios, there is greater uncertainty when only a single serosurvey is considered. The IFR values are shown as a probability.

Supplementary Note 2. First wave data
Seroprevalence data Identification of relevant serological studies was carried out using the `SeroTracker` dashboard 14 , a continuously updated systematic review. Observed seroprevalences, disaggregated by age and location where possible, were extracted. For the majority of studies (7/10), raw counts for the number of seropositives and number tested were available. However, this information was not available for three studies (conducted in Italy, Sweden and Denmark) and reported seroprevalence was extracted instead, alongside the stated confidence intervals. As a result, we used the reported confidence intervals to capture standard errors and used a logittransformation instead of a binomial approach in our statistical model likelihood (i.e. a normal likelihood on the logit-scale). Further uncertainties in the data extraction process include: • For the Swedish study, seroprevalence data was extracted from public health reports that were being constantly updated and did not provide exact dates of the period in which the serostudy was conducted. It was therefore assumed that the serostudy was carried out in contiguous weeks across the study period, based on the provided figure in the report (accessed September 11, 2020). • In 5/10 studies (Zurich, Switzerland; Denmark; Sweden; Netherlands [second time point]), the age-specific seroprevalences and sample sizes were from broad age groups or did not overlap well with reported death age groups, or only a national prevalence was available. As a result, we assumed a constant attack rate (seroprevalence) across age groups for these studies. • In instances where the reported serologic age groups did not directly match with reported cumulative death age groups, we assumed similar seroprevalence values in contiguous age groups or used the overall average for missing age groups.

COVID-19 mortality data and demographics
In a limited number of cases, Ministries of Health have retrospectively removed previously reported deaths, leading to negative reported deaths on particular days -these observations were set to zero. On May 18 for the New York data, nearly 4,500 deaths were retrospectively added, which caused an outlier in daily observed deaths. Inclusion (not shown) or exclusion of this day's data resulted in very similar New York IFR estimates because it was difficult for the model to capture this spike. MCMC chains appeared much more stable when this day's data was excluded, so were used for the final result. Supplementary Table 2 -Data Extraction Sources: The sources of data for IFR estimation in the first wave (reference numbers). All data are publicly available. For some regions, population demographic data was taken from the City Population (CP) website: http://citypopulation.de/ * Sweden seroprevalences were limited April 20 -June 12, 2020 when multiple regions were considered (Jämtland-Härjedalen, Jönköping, Kalmar, Skåne, Stockholm, Uppsala, Västerbotten, Västra Götaland, and Örebro). These provinces include the majority of the Swedish population and were assumed to be nationally representative. ተ Serovalidation data has not yet been released for the Italy study. As a result, serovalidation data from a comparable study in San Francisco, USA, which used the same Abbot seroassay, were used as priors for the regional model and re-estimation of the test sensitivity and specificity.
Supplementary Figure 2 -Seroprevalence versus Age: Observed seroprevalence by age (mean age within each age group plotted) and 95% confidence intervals for the seven studies where age-specific information was reported.
The results of the latest seroprevalence survey are plotted among locations where multiple seroprevalence surveys had been conducted (except for the Netherlands where age specific seroprevalence was only available in the first survey). In our analysis we assumed a uniform seroprevalence across age groups among the remaining three studies included that did not contain age-specific seroprevalence data. Supplementary table 5 -Observed serovalidation results for each included studies and posterior estimates of sensitivity and specificity: For each study, the number of test positives versus true positives reported were used to parameterize the sensitivity beta-distributed prior in the IFR model. Similarly, the number of test negatives versus true negatives reported were used to parameterize the specificity beta-distributed prior in the age-specific IFR model (Table 1, main text). For a subset of studies (6/10), regional information was used to re-estimate the true study sensitivity and specificity (using the region-based model described below). The resulting posterior medians and 95% credible intervals by study and model (without and with seroreversion) are provided for the initial simpler regionbased model and the subsequent main statistical model.

Posterior predictive checks
Supplementary Figure 3 -Posterior Draws of Modelled Seroprevalence without Seroreversion for the Oldest age group: Results are presented from the statistical modelling framework where seroreversion was not included. For each study and its associated model fits, we selected 100 posterior draws of the modelled true seroprevelance (yellow) based on their posterior probabilities. Due to space constraints, we only show here the results for the oldest age group, although younger age groups showed similar results. Given this true inferred seroprevalence, the expected observed seroprevalence, with the Rogan-Gladen correction, is also displayed (blue). The observed seroprevalence for the oldest age group is indicated along with its 95% confidence interval in black with the serostudy time-period demarcated by a grey box. Overall, observed seroprevalences matched closely with modelled seroprevalence and varied in some instances due to the modelled differences in the distributions of infections across age groups (Supplementary Data 1).
Supplementary Figure 4 -Posterior Predictive Check: Inferred Deaths without Seroreversion for the Oldest age group: Results are presented from the statistical modelling framework where seroreversion was not included. For each study and its associated model fits we selected 100 posterior draws of the modelled IFR and delay of infection onset to death based on the posterior probabilities (grey). Inferred deaths were projected forward in time according to the gamma distribution fit and compared to the age-specific time-series deaths (i.e. model input: blue). Due to space constraints, we only show here the results for the oldest age group, although younger age groups showed similar results.
Supplementary Figure 5 -Posterior Draws of Modelled Seroprevalence with Seroreversion for the Oldest age group: Results are presented from the statistical modelling framework where seroreversion was explicitly included. The seroreversion sensitivity analysis was used to present a conservative estimate of the IFR, as we assumed a fast rate of antibody waning from the Abbott assay (see main text). For each study and its associated model fits, we selected 100 posterior draws of the modelled true seroprevelance (yellow) according to the posterior probabilities. Due to space constraints, we only show here the results for the oldest age group, although younger age groups showed similar results. Given the true inferred seroprevalence, the expected observed seroprevalence, with the Rogan-Gladen correction, is also displayed (blue). The observed seroprevalence for the oldest age group is indicated along with its 95% confidence interval in black with the serostudy time-period demarcated by a grey box. Overall, observed seroprevalences matched closely with modelled seroprevalence and varied in some instances due to the modelled differences in the distributions of infections across age groups (Supplementary Data 1). The effect of seroreversion later in the pandemic is apparent, particularly in Italy.
Supplementary Figure 6 -Posterior Predictive Check: Inferred Deaths with Seroreversion for the Oldest age group: Results are presented from the statistical modelling framework where seroreversion was explicitly included. The seroreversion sensitivity analysis was used to present a conservative estimate of the IFR, as we assumed a fast rate of antibody waning from the Abbott assay (see main text). For each study and its associated model fits, we selected 100 posterior draws of the modelled IFR and delay of infection onset to death based on the posterior probabilities (grey). Inferred deaths were projected forward in time according to the gamma distribution fit and compared to the recasted age-specific time-series deaths (i.e. model input: blue). Due to space constraints, we only show here the results for the oldest age group, although younger age groups showed similar results.
Specificity estimates from regional data Supplementary Figure 7 -Regional model fits to data for six studies with regional serosurvey data: For each country: (A) relationship between seroprevalence and COVID-19 mortality per 100,000 population in the data (observed seroprevalence not adjusted for test sensitivity and specificity) and model-estimated true seroprevalence (adjusted for test sensitivity and specificity) and mortality with 95% credible intervals. (B) as A but showing modelestimated observed seroprevalence (as would be observed given the model-estimated sensitivity and specificity). (C) Specificity posterior distribution (solid line) versus the value reported in the study (dashed line) (D) Observed versus model-estimated age-specific seroprevalence (adjusted for sensitivity and specificity).

Spain
Italy Denmark (average seroprevalence across all ages assumed to be the same due to lack of overlap of mortality and seroprevalence data age groups)

Supplementary Note 4. Pooled IFR calculation
We pooled the posterior age-specific IFR estimates across studies using weighted log-linear regression, where weights were calculated as the posterior precision of age-and study-specific IFR estimates. After fitting the weighted log-linear regression model, we found that there was heteroskedasticity in the model residuals (Supplementary Figure 8). To account for this, we modelled variance as a quadratic equation with respect to the mean age within each age group for each study categorized into one of the 10-year age bands ( Table 3). Our final model therefore consists of a fitted linear model for the mean of the log IFR by age, with the variance in log IFR changing with age. Under the assumption of normally distributed residuals in log space, this corresponds to a log-normal model in linear space, in which the mean is a function of both the mean and the variance in log IFR. This log-normal model was used to produce the final predictive intervals in 5-year age bands.
We then standardized these age-specific estimates to the population demographies of four countries: Madagascar, Nicaragua, Grenada and Malta, which have previously been identified as representative countries for each of the World Bank Income Strata: LIC, LMIC, UMIC, and HIC 15 . The overall IFR prediction interval was estimated using a Monte Carlo sampling approach: repeatedly drawing from the appropriate log-normal distribution (as described above) and combining values as weighted sums, where weights were calculated from the age-specific demographics. Pooled log linear models fit with demographic weights by age groups resulted in similar age-specific pooled IFR estimates (not shown).